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A Software Tool for Dataflow Graph Scheduling 

Robert L. Jones III 
NASA Langley Research Center 

A graph-theoretic design process and software tool is presented for selecting a 
multiprocessing scheduling solution for a class of computational problems. The problems of 
interest are those that can be described using a dataflow graph and are intended to be executed 
repetitively on multiple processors. The dataflow paradigm is very useful in exposing the 
parallelism inherent in algorithms. It provides a graphical and mathematical model which 
describes a partial ordering of algorithm tasks based on data precedences. That is, some tasks 
must execute in a particular order whereas other tasks may execute independent of other tasks 
Dataflow graph nodes represent schedulable tasks and edges represent the data dependencies 
e ween the tasks. Aiialytical analysis of the dataflow graph is possible for many digital signal 
processing (DSP) and control law algorithms which are deterministic. For determinism, the 
model is applicable to a class of dataflow graphs where the time to execute tasks are assumed 
constant from iteration to iteration when executed on a set of identical processors. Also it is 
assumed that the dataflow graph is data independent. Any decisions present within the 
computational problem must be contained within the graph nodes rather than described at the 
graph level. Special transitions called sources and sinks are also provided to model the input and 
output data streams of the task system. The presence of data is indicated by marking the dataflow 
graph with tokens. The graph transitions through markings as a result of a sequence of node 
firings. A node is enabled for firing when a token is available on every input edge of the node 
indicating that the task has all of its operands. When the node fires, it encumbers one token from 
each of its input edges, delays an amount of time equal to the task latency, and then deposits one 
token on each of its output edges. Sources and sinks have special firing rules in that sources are 
unconditionally enabled for firing and sinks consume tokens, but do not produce any. By 
analyzing the dataflow graph in terms of its critical path, critical circuit, dataflow schedule, and 

the token bounds within the graph, the performance characteristics and resource requirements can 
be determined a priori. 

As for any mathematical model, there is a need for efficient software tools which facilitate 
the use of the model in solving problems. A software program, referred to as the Dataflow 
Design Tool, was developed at Langley to apply the dataflow model and design multiprocessor 
solutions for spaceborne applications. The tool was written in C++ for Microsoft Windows 3. 1 or 
NT can be hosted on an i3 86/486 personal computer or compatible. The Design Tool takes input 
from a text file which specifies the topology and attributes of the dataflow graph. A Graph Tool 
was developed to facilitate the creation of the graph text file. The various displays and features 
are shown to provide an automated and user-interactive design process which facilitates the 
selection of a multiprocessor solution based on dataflow analysis. Performance metrics 
determined automatically by the Dataflow Design Tool include the minimum time to execute all 
as s for a given computation (schedule length), the minimum time between graph input and the 
corresponding output (TBIOlb), the minimum graph-imposed iteration period (To) and the 
minimum time between outputs (TBOib) Also, the tool determines if tasks can be delayed a finite 
amount of time without degrading performance, referred to as slack time. Since the edges implv 
the physical storage of data, the tool can calculate the minimum data buffers required for proper 
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sharing of data between tasks. In addition to numerical performance metrics, the tool graphically 
portrays system behavior using Gantt charts and resource envelopes. The Single Graph Play 
displays the steady-state task schedule associated with a single computation, and the Total Graph 
Play displays the periodic, steady-state task schedule over a single iteration period. 

The analysis and multiprocessor mapping of a finite impulse response (FIR) filter is 
illustrated. A linear phase FIR filter is selected since it requires half the number of multiplies of 
other FIR realizations. DSP problems are very suitable for dataflow analysis since they are 
typically described as signal flow graph. One can easily translate signal flow graphs to dataflow 
graphs by locating the computations (addition and multiplication) and representing unit delays 
(inverse z terms) with initial tokens. Once the filter has been captured into the Graph Tool it can 
be analyzed by the Dataflow Design Tool to expose the inherent parallelism and determine graph- 
theoretic performance bounds. Since there are many realizations of the same filter, characterized 
by different dataflow graphs, the Dataflow Design Tool can be usefiil in determining which 
rea ization exposes the most parallelism. The SGP shows that some of the additions can execute 
in parallel (Cl through C4), enabling the parallel execution of the multiplies and finally the 
sequential summation to generate the output sample. The SGP bars are shaded to depict the read, 
process, and write activities of the processor, and the hollow bars denote slack time associated 
with some tasks. In addition to the parallel concurrency, the TGP shows pipeline concurrency 
that may be exploited. In this example, the TGP shows that at most 4 different data samples may 
be computed within a sampling period of 224 time units. The Total Resource Envelope shows 
that 10 processors are required to achieve this level of throughput. The dataflow analysis applied 
to the dataflow graph and portrayed in the graph play diagrams assume infinite resources 
(processors and memory) so that the exposed parallelism is limited only by the data precedences 
If there is not enough resources to exploit the inherent parallelism, the schedule must be 
optimized. As an example, lets assume that a fully-static schedule (i.e., a task will execute on the 
same processor for every iteration) on 8 processors is desirable to minimize run-time overhead 
The Dataflow Design Tool shows that such a solution can be achieved by inserting two additional 
artificial” data dependencies and increasing the sampling period to 260 time units The tool can 
also display the periodic memory accesses within a periodic schedule. Such an assessment may be 
useful to optimize the schedule based on the limited bandwidth between processors or processors 
and memory. Once a desirable solution is obtained, the tool can summarize the scheduling 
constraints in terms of earliest start (ES), latest finish (LF), and slack time. The summary of run- 
time requirements include task instantiations (INST) defined as the number of processors a task 
will have to execute on simultaneously for different data sets. For a fully-static schedule, one 
would expect all instantiations to be 1 as shown. Also, the buffer sizes (QUEUE) for shared data 
is given along with the number of initially empty buffers (OE) and the number of initially full 
buffers (OF) due to initial data. 

In summary, the dataflow paradigm provides a general model suitable in exposing 
parallelism inherent in algorithms as fine-grain as filters to more computationally complex 
algorithms where a node might represent an entire filter. When the algorithm is deterministic the 
Dataflow Design Tool can analytically determine the steady-state behavior, performance bounds, 
scheduling constraints, and resource requirements. By permitting the user to insert artificial data 
dependencies, the dataflow schedule can be optimized to match resource requirements with 
resource availability. 
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Eight-Order, Linear Phase FIR Filter 
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A DSP signal flow graph is a Dataflow Graph where the z _1 unit delays can be 
modeled with initial tokens. Thus, run-time implementation of delay does not 
incur any overhead. Unit delays are simply implemented by initializinq FIFO 
queues used for intermediate data. 


109 




Dataflow Graph Capture of FIR Filter 



Assumptions: Shared memory with no contention 
Multiplies take 200 time units 
Additions take 100 time units 

Performance One-operand read/writes take 10 time units 
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Exposing the Parallelism in the FIR Filter 



Optimization for 8 Processors 

A tullystatic schedule is desired for minimum run-time overhead. 
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Fully-Static Processor Requirements 


Sampling Period = 260 time units 


Processor Assignments 

PI {C1+, C4+} 

P2 {C2+, C3+} 

P3 {C5*} 

P4 {C6*} 

PS {C7*} 

P6 {C8*} 

P7 {C9+, C10+} 

P8 {C11+} 


Total of 8 DSP Chips are Required 



Analysis of Memory Access 


Optimized schedule has better distribution of memory accesses which e a can 
be accomodated with 6 Independent communication ports In the TMS320C40's. 


Unoptimized Schedule 



Too many localized memory 
references! 


Optimized Schedule 
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Memory references are more 
evenly distributed. 
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Summary of Fully-Static Multiprocessor Solution 

FIR Filter 
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Summary 

• Dataflow provides a general model of computation 
capable of exposing fine- and large-grain 
parallelism. 

• Design Tool provides analytic, compile-time 
prediction of: 

- Steady-state behavior 

“ Graph-theoretic performance bounds 

- Iterative run-time scheduling criteria 

• Permits inclusion of artificial precedences for 
optimization. 

• Facilitates selection of static run-time schedules. 
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Use of Software through Pictures on CERES 


Th e C ERE S team has been using the Yourdon/De Marco Structured Analysis/Structured Design 
methodology to develop the data management system for producing higher order science data 
products from CERES instrument data. As part of this effort, the team is using the Software 
through Pictures CASE tool to automate portions of the methodology. This presentation 
addresses the team’s experiences with the selected methodology and CASE tool, describes lessons 
learned, and provides recommendations for other teams contemplating the use of structured meth- 
odologies and CASE tools. 


Software Engineering methodologies can help developers create systems in less time with higher 
reliability and quality by providing tools for managing the complexity inherent in software sys- 
tems and development programs. CASE tools can facilitate using a methodology by providing 
tools for creating and maintaining requirements and design models, automating consistency and 
completeness checking, and automating much of the bookkeeping associated with following the 
methodology. This allows developers to focus on the creative aspects of software design and 

npuplnnmpnt ^ 


Overall, our experience on CERES has been that structured methodologies and CASE tools prove 
useful m creating, maintaining, and documenting high quality requirements and analysis products 
Although the learning curves associated with these tools require an investment in time and train- 
ing early on, the benefits to be gained are well worth the effort and our productivity continues to 
increase as we become more familiar with the methodology. 


To date, the CERES data management team has used the tool to model more than 130 data prod- 
ucts down to the level of atomic variables, define each data element in terms of type, units, accu- 
racy, and number of bits, and create documentation from the information stored in the models. 

ince t e CERES system is primarily a science data processing system which generates more than 
5 terabytes of data per month, focusing on the system’s data products has led to a deeper under- 
standing of processing needs and resulted in higher quality functional requirements. Furthermore, 
the graphical editors and consistency checking features provided by the tool have allowed the 

team to rapidly iterate through the modelling process in less time than would have been required 
without the tool. 


The data management team is currently analyzing system functional requirements by modelling 
the functionality needed to process instrument and higher order science data. Here again, the tool 
speeds up the process of iterating on the model to converge on a final solution. In addition, the 
tool has allowed the team to automatically produce software requirements documents in a stan- 
dard format from information contained in the CASE tool database. 

We have incorporated several customizations in order to tailor the CASE tool to support the spe- 
ci c processes employed on CERES. These customizations include creating templates for pro- 
c £ ERES ' s P ecific documentation, enhancing the CASE tool main menu, and integrating 
t e CASE tool with the FrameMaker desktop publishing package. The CASE tool is supplied 
wit templates for producing documentation that complies with military software standards. 
Since these standards were not appropriate tor NASA publications, we developed templates for 
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several documents including a Software Requirements Document, Data Products Catalog and 
Data Dictionary as well as several utilities to provide hard copies of details stored in the tool’s 
database for developer s use in reviewing their models. We have also modified the tool’s main 
menu to simplify the user interface for creating documents. Finally, there are several places in the 
ool where the developer adds detail to the requirements or design model by entering free form 
text. These items include functional descriptions, data product descriptions, and interface 
descriptions. The CASE tool only supports ASCII text and, since much of our processing is 
described in terms of equations, tables, and graphics this restriction limited our ability to fully 
describe the necessary processing. Therefore, we have modified the tool to allow the use of 
rameMaker (desktop publishing/word processor) for entering descriptions of functions, data 
products and interfaces. This allows a designer to include any combination of text, graphics 
tables, and equations in these descriptions which are then included directly into the documenta- 
tion produced using the tool. 

Our experience indicates that when combined with well-structured methodologies, CASE tools 
can provide a important component of a development environment which helps designers create 
software products with higher quality in less time. However, the key to achieving productivity 
gains ,s the process used to design the software. The processes incorporated in structured analysis 
and structured design provide a sound framework for creating complex software systems and 

must be adopted in order to derive any benefits from the use of automated tools such as Software 
through Pictures. 
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INTRODUCTION 


• CERES Overview 

• Software Development Methodology 

• CASE Tool Capabilities and Configuration 

• Experiences to Date 

• Lessons Learned/Recommendations 
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CERES OVERVIEW 


• Scientific Data Processing Application 

• Approximately 500K Source Lines of Code 

• Organized into 12 Subsystems (CSCIs) 

• Generates More Than 5 TeraBytes of Data per Month 

• Operates within the EOSDIS Environment 

• Languages include FORTRAN, C, Ada 
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METHODOLOGY (cont’d) 


• Structured Analysis/Structured Design 

• Model Based Approach 

• Emphasis on Early Life Cycle Phases 

• Requirements - Model functionality 

• Design - Models Architecture of Solution 
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METHODOLOGY (cont’d) 
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CASE TOOL CAPABILITIES 

• Automate Portions of Methodology so Developers can 
Focus on Creative Aspects of Software Design 

• Provide Tools to: 

• Rapidly Create and Modify Models 

• Capture Models in Central Repository 

• Check Model Validity (Completeness, Consistency) 

• Support Multiple Developers in Work Group Environment 

• Create Documentation from Models in Repository 
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EXPERIENCES TO DATE 


• Achievements 

• Data Products Modelled (incl. Data Structure and Data Description 
Details) 

• Data Product Catalogs Generated from Data Models (Sizing 
Analysis Computed by Tool) 

• Currently Modelling Each Subsystem 

• Automatically Produce Requirements Documentation in Standard 
Format for Each Subsystem 
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EXPERIENCES (cont’d) 

• Issues 

- Multiple-Site Configuration Complicated System Administration 
Functions 

Document Definition in Parallel with Template Development 
Resulted in Excessive Template iterations 

" “Loose” Configuration Management of Customizations Created 
Synchronization Problems Among Multiple Sites 

Lack of CASE/Methodology Expertise at Each Site Slowed CASE 
Tool Adoption 
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EXPERIENCES (cont’d) 


• Quality Action Team Established by NASA to Address 
Concerns, Improve CASE Tool Use 

• Membership From All Organizations 

• Results to Date Have Been Very Positive 



• Enhanced Understanding and Awareness of Concerns Among 
Organizations 


• Simplified System Administration Process and Configuration 

• Establish Forums for Information Dissemination, Identified 
Training Needs, Conducted Training 


• Improved Development, Test, CM Process for Customizations 



LESSONS LEARNED/RECOMMENDATIONS 


• CASE Tool Introduction Represents Potential Culture 
Change 



• Strong Management Support Required 

• Steering Committee Useful for Coordinating Adoption Process 

• Timely Dissemination of Information Necessary, Exploit Electronic 
Communications Media (e-mail, bulletin boards, WWW) 

• Methodology is Key Element, Training is Critical 


• CASE is Engineering Tool, Documentation is By-Product 

• Customizations Represent Development of Utility Codes, Should Use 
Structured Development Process 
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SESSION 4 Solutions of Equations 
Chaired by 
Olaf Storaasli 


4. 1 Rapid Solution of Large-scale Systems Of Equations - Olaf Storaasli 

4.2 Solution of Matrix Equations Using Sparse Techniques -Majdi Baddourah 

4.3 Equation Solvers for Distributed Memory Computers - Olaf Storaasli 
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